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Abstract 

We consider the problem of minimizing cost among one-to-one assignments of n jobs onto n 
machines. The random assignment problem refers to the case when the cost associated with performing 
jobs on machines are random variables. Aldous established the expected value of the smallest cost. An, 
in the limiting n regime. However the distribution of the minimum cost has not been established yet. 
In this paper we conjecture some distributional properties of matchings in matrices. If this conjecture is 
proved, this will establish that y^{An — E{An)) ^ N{0, 2). We also establish the limiting distribution 
for a special case of the Random Assignment Problem. 

I. Introduction 

Consider the problem of assigning n jobs onto n machines. Let Cij denote the cost of 
performing job i on machine j. Consider a 1-1 assignment, tt, that matches the n jobs to n 
machines. Let 

n 
1=1 

denote the cost associated with the matching (assignment) tt. Further let n* denote the matching 
that minimizes the cost among all n\ matchings. Define 

n 

■^n = ^ ] Cj,7r*(j) 
i=\ 

It has been shown that when the random variables Cj j are independent and identically distributed, 
the distribution of An is only dependent on the value of the density function at origin. Two 
popular choices in literature for the density function of qj have been ?7[0, 1] and exp(l). We 
will assume throughout the rest of the paper that Cjj are i.i.d. exp(l) random variables. 

A. Recall of some results 

For large values of n, Lazarus [1] showed that E{An) > 1 + ^. Later, Olin [2] improved it to 
1.51 in her Ph.D. thesis. Walkup, [3], established an upper bound of 3 for E{An) in the large n 
regime. In [4] Karp improved this upper bound from 3 to 2. Coppersmith and Sorkin improved 
this bound further to 1.91 in [5]. 

Using Replica Method, a technique developed by Statistical Physicists to study interactions 
between particles, Mezard and Parisi, argued that the limit of E{An) was They also computed 
the distribution of a randomly chosen entry that was part of the smallest matching. However 



this method makes assumptions that cannot be rigorously justified. They also claimed that the 
assignment problem had the 'self averaging property', i.e. the distribution of An concentrates 
around the mean for large n. 

In [6], Aldous rigorously established that the limit of E(An) exists. Later, in [7], he established 

2 

that the limit was as predicted by the physicists. He also recovered the distribution for a 
random entry in the smallest assignment. As a further evidence to the Physicists' approach, 
Talagrand showed that the variance decayed at a rate that was lower bounded by - and upper 

bounded by 

B. Finite Random Assignment Problem 
For every finite n, Parisi conjectured that 

E{An) 

This was established last year simultaneously using very different approaches in [8] and [9]. 
The latter approach is an size-based induction on matchings building from the smallest entry 
(smallest matching of size 1) to the smallest matching of size n. 

In [9], the authors establish the distribution of these increments from the smallest matching 
of size k to the smallest matching of size k + 1. Though the smallest matching of size n is the 
sum of these increments, correlations between these random variables prevent them from getting 
the distribution for the smallest matching of size n. However linearity of the expectation was 
sufficient for them to get the expected value of the smallest matching of size n. In this paper, 
we conjecture the exact nature of these correlations in the large n regime. These conjectures 
imply that 

C. Results on the limiting distribution 

In [7], Aldous commented that one would expect the limiting distribution to be Gaussian. In 
[10], Aim and Sorkin conjectured that the limiting variance of ^/n{An — E{An)) is 2. The basis 
of the conjecture regarding the variance, according to the authors, is based on a communication 
between Janson and the authors in which Janson guessed the exact distribution for every finite 
n. This guess turned out to be incorrect for n > 3 but seemed very close to the true distribution. 

The conjecture in this paper regarding the correlations in the large n regime, when applied 
to finite n will yield distributions that have a lot of similar terms to that of Janson's guess. 
However, the finer nature of our conjectures and the differences in some terms help us conclude 
that the limiting distribution is Gaussian rather easily. 

In the last section of this paper, we consider a special cost matrix in which n — 1 diagonal 
entries are zero and the rest are i.i.d. exp(l) entries. We identify the scaled limiting distribution 
for this case. The limiting distribution follows from a very simple case of a theorem in [11] and 
a connection between the form of the distribution to the distribution of shortest path lengths in 
complete graphs. 




D. Notation and Recall 

Consider an nxn matrix, M. For any k such that 1 < A; < n — 1, let Tj'^ represent the smallest 
matching of size k in this matrix. A matching is a collection of elements with the property that 
no two elements lie in the same row or same column. Note that occupies some k columns of 
m and w.l.o.g we can assume it is the first k columns. Now let S'f denote the smallest matching 
of size k in the nxn — 1 matrix obtained from M by the removal of column i. Thus one obtains 
the k + 1 matchings T^, S^, S^. (Observe that removal of any column outside the first k will 
yield as the smallest matching). 

Sort the matchings in order of their increasing weight to obtain the sequence Tf,...,T|Yi- 
(Note: Tf being the smallest matching of size k in the entire matrix will be smaller than every 
S^). In a slight deviation from the notation in [9], let denote the weight of the matching Tj^. 
Define TJ^ to be smallest matching n* and hence = An. 

We recall the following result from [9]. 

Theorem 1: The following hold: 

• tj*Vi ~ ~ exp(7T, - i + l){n - k + i - 1) 

• {4-ttt3-4^-^ 4+1 - tt t'l+^ - tfc+ii are independent 

Theorem 1 gives an explicit characterization of the distribution relating the difference between 
the smallest matching of size A; + 1 and the smallest matching of size k in terms of sums of 
independent exponentials. 

Remark: Note that Theorem 1 does not give the entire distribution as it does not say anything 
regarding the dependence of the variables t^^^ — and t'^ — 

II. Conjectures on Correlation 

Consider a set of variables Af defined by the following set of equations. All the random 
variables are assumed to be independent. 

A^ ~ exp(n(n — A; + 1)), for A; = 1, n 

Afe f w.p. 

{ exp{n-t + l){n- k + i) w.p. 

Now define random variables r'l recursively according to the following relations: 
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It is easy to see that 's satisfy the conditions of Theorem [H i.e. 

• — ~ exp(r7, — i + l){n — k + i — 1) 

• {rf — , rf — Tg, r^+j^ — r^, rj^"*""^ — r^^^^} are independent 

Observe that this equivalence of the marginals of the increments also implies E(t^) = E{r^). 



Remark: The initial guess was that the distribution of r,f was in fact the distribution of t^. 
However this was observed not to be true for n > 3. Calculations for n = 3 and n = A 
demonstrated that the distribution of rf and t\ are very close to each other though not exactly 
equal. Simulations for higher n confirm this observation. This makes us conjecture that the under 
the correct scaling (i.e. multiplication by i/n) the error terms are of lower order and they die 
down as n becomes large. 

Conjecture 1: Let = F[^/E{t^ - E{t'^)) < x] and let Gn{x) = F[^/^{r^ - E{tr^)) < x]. 

Then \Fn{x) — Gn{x)\ ^ 0, Vx as n ^ oo. 

Assuming that Conjecture 1 is correct, then this would imply that if 

^ A^(0,2), then 
^{A^ - E{An)) ^ N{0, 2), since = 

We prove the first claim in the lemma below. 

Lemma 1: y/E{r1 - ^(r^)) ^ iV(0,2). 

Proof: Writing r" in terms of the random variables A,f we obtain the following relation. 

n k 
k=l 1=1 

Let = E(Aj^) and let /i„ = E{r\'). Then we note the following: 

n k 

limn(E«-yU„)2) =limnVV(n-A; + l)2E(A*^-^(=)2 = 2 (1) 

k=l 1=1 

n k 

limn'V y(n-A: + l)^^(A^--^^y = (2) 

k=l 1=1 

The proofs of these two equations were obtained using MATHEMATICA and hence has been 
omitted from the paper. 

Now we apply the Central Limit Theorem for arrays to finish the argument. Let X„ ^ j = 
^{n -k + l)(Af - /if). Observe that Ek,i^nA^ = V^i^^ " ^('^D)- 

Eqns © and @ imply the following conditions for the zero-mean independent random 
variables X„ j. 

. \unnEk,iE{Xl,^^) = 2. 

. lim„E.,i?(^n%,J = 0. 
Hence they satisfy the Lyapunov conditions for CLT and thus we have 

Xn,k,i ^ N{0, 2) as n ^ oo 

k.i 

This completes the proof of the lemma and hence assuming Conjecture 1 is true, this establishes 
the limiting distribution of An. ■ 
Remark: Though the Lyapunov CLT is normally stated with the third moment rather than the 
fourth moment used here, it is easy to see that any 2 + 6 moment is sufficient.) 



Now consider the increment r^~^^ — r\. The distribution for this increment can be explicitly 
stated in terms of sums of independent exponentials as stated in Theorem 1. However, from the 
definition of the random variables rf we get the following relation: 

k 

_ ^k ^ ^k _ ^ ^k+i 

i=l 

Hence r^'^^ — r^" > r\ — r\^^. The following lemma shows that this is true for 's also. 

Lemma 2: t^^^ - t\ > t\ - t^"^ 

Proof: Re-arranging the terms it is sufficient to show that t\'^^ + > 2tp 

Case 1: If the matching T^+^ contains one element that lies outside the rows and columns 
occupied by T/^~^, then we can combine this element with the matching T^~^ and get a matching 
of size k. Note that the rest of the elements of +^ is a matching of size k. Therefore we can 
identify two matchings of size k from among the elements of Tf^^ and Tf^^. Therefore the 
combined weight of these two matchings of size k must be greater than twice the weight of the 
smallest matching of size k. 

Case 2: When there is no element of t/^+^ that lies outside the rows and columns of T^~^ 
we establish the lemma by using the following two properties of matchings. First, the rows and 
columns used by the smallest matching of size k contains all the rows and columns used by 
the smallest matching of size k — 1. We represent a matching as a bipartite graph and an edge 
is present between node i on the left and node j on right if the element is present in the 
matching. The second property is that we can decompose two matchings (represented on the 
same bipartite graph) into the following three components: common edges, alternating paths and 
alternating cycles. 

Consider a bipartite graph formed by the elements of T^^+^ and T^. From the first property 
this isa/e + lx/c + l bipartite graph. Color the k + 1 edges represented by the elements of 
T^~^^ by red and the k — 1 edges represented by the elements of T^~^ by green. Now from the 
minimality of these matchings there cannot be any cycles. The first property also implies that 
the alternating paths must be of odd length and must have one extra red edge. (If it is of even 
length or has one extra green edge then we see that property one is violated). Therefore, we can 
decompose the bipartite graph into common edges and two alternating paths each having one 
extra red edge. 

Now form one matching of size k by picking the common edges, red edges from first 
alternating path and green edges from second alternating path. Form the second matching of 
size k by picking common edges, green edges from first alternating path and red edges from 
second alternating path. Observe that the total weight of these two matchings of size k is equal 
to t'l'^^ +ti^^. But this should be greater than twice the weight of the smallest matching of size 
k. This completes the proof of the lemma for Case 2. ■ 

III. Limiting Distribution for a special case 

In this section, we assume that the cost matrix has the following form. Ci^i = for 1 < i < 
n — 1. The rest of the entries are assumed to be i.i.d. exp(l) random variables. Let An represent 
the weight of the minimum matching when the cost matrix has this structure. 

We recall some notation from [9]. Consider a (n — 1) x n matrix of i.i.d. exp(l) entries. Let 
Ti be the smallest matching of size n — 1. W.l.o.g. let it occupy the first n — 1 columns. Let Si 



be the smallest matching of size n — 1 in the (n — 1) x (n — 1) matrix obtained after deleting 
the zth column. This gives the matchings Ti, Si, Sn-i- Now arrange them in increasing order 
of their weights to obtain the sequence Ti, T2, T„. Further, as before let U denote the weight 
of the matching Tj. 

We recall the following theorem from [12]. 

Theorem 2: Consider a n — 1 x n matrix of i.i.d. exp(l) entries. Now conditioned on a fixed 
placement of minimum in each row the following hold: 

• ti+i - ti ~ exp(z(n - i)) 

• {h — h, ■■■,tn — tn~i} are independent. 

Remark:The proof of this theorem follows the same line of argument as the proof for the 
case without conditioning on the placement of the minima that is in [9]. However the details 
of the argument is in [11]. Note that the special case of the theorem |2l used below is very 
straightforward and can be established without the machinery of [9]. 

Consider a special case of Theorem |2l where we assume that all the minimum lie in different 
rows. Now we form a new matrix by subtracting the minimum entry in each row from all the 
entries in the row. By the memoryless property of the exponential distribution, this new matrix 
reduces to having zeroes where the minimum in each row was present and i.i.d. exp(l) random 
variables in other locations. W.l.o.g. we can assume that the zero entries are located at {i, i) 
for i = 1, .., n — 1. Also observe that for this new matrix, the weights of its T— matchings are 
{0, t2 — ti, tn — ti\ where U are the weights of the T— matchings of the original matrix. 

Now observe that A„_i has the same distribution as s„_i — ti. However by symmetry, this is 
equally likely to be one of (^2 — ti, ^3 — ti, t„ — tn-i)- Therefore from Theorem it follows 
that the distribution of An-i is given by: 



exp(n — 1) w.p. 
exp(n — 1) + exp 2(n — 2) w.p. 



1 
1 

(3) 



ELJexp(A;(r2- /c)) w.p. ^ 



Consider the following unrelated problem. There is a complete graph on n vertices and its 
edge weights are i.i.d. exp(l) random variables. Let i and j be two randomly chosen vertices. 
Let Xl'j denote the weight of the cheapest path from i to j. Then the distribution of X^j is 
given by eqn. ©. This was shown by Janson in [13]. However a deeper connection between the 
problems other than the algebraic equivalence of the distributions has proved elusive. 

In [13], Janson also computes the asymptotic distribution of the random variable X^". He 
shows that 

X^.-logn^Wi + W2-W3 

where Wi,W2, W3 are independent random variables with the same extreme value distribution 

P{Wi <x) = e-'~\ 

In [14], Aldous had suggested that the limiting distribution of the random variable governing 
the asymptotic distribution of could possibly be written as 

nX^^-hgn^C 



where P{C > x) = E{e~^^'^^), where U,V are independent exp(l) random variables. The 
equivalence of Janson's result and the form suggested by Aldous is quite straightforward. 

Thus, the limiting distribution for the special cost matrix in which all entries on the diagonal 
except one are zero and the rest of the entries are i.i.d. exp(l) entries is given by 

nAn — log n ^ C 

where the random variable C is as defined above. 

IV. Conclusion 

In this paper we conjecture that the increments of the small matchings in a matrix has a 
particular correlation structure in the large n regime. This conjecture, if true, would prove that 
the limiting distribution of the smallest matching when properly scaled and centered would 
converge to N(0,2). 

A similar set of conjectures can also be stated for the case where the matrices are rectangular 
and using those correlation structure one could guess the limiting distribution for the case for 
an m X n matrix. Note however one expects the right scaling to be proportional to y/n only if 
m scales as an for some a > 0. 

We also prove the limiting distribution for a special case of the cost matrix. However this 
is based on a algebraic identity between the distributions in two seemly unrelated problems. It 
would be good to understand if it is something more than a coincidence. 
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